Search CORE

19 research outputs found

Designing similarity functions

Author: Trigg Leonard E.
Publication venue: The University of Waikato
Publication date: 23/01/2023
Field of study

The concept of similarity is important in many areas of cognitive science, computer science, and statistics. In machine learning, functions that measure similarity between two instances form the core of instance-based classifiers. Past similarity measures have been primarily based on simple Euclidean distance. As machine learning has matured, it has become obvious that a simple numeric instance representation is insufficient for most domains. Similarity functions for symbolic attributes have been developed, and simple methods for combining these functions with numeric similarity functions were devised. This sequence of events has revealed three important issues, which this thesis addresses. The first issue is concerned with combining multiple measures of similarity. There is no equivalence between units of numeric similarity and units of symbolic similarity. Existing similarity functions for numeric and symbolic attributes have no common foundation, and so various schemes have been devised to avoid biasing the overall similarity towards one type of attribute. The similarity function design framework proposed by this thesis produces probability distributions that describe the likelihood of transforming between two attribute values. Because common units of probability are employed, similarities may be combined using standard methods. It is empirically shown that the resulting similarity functions treat different attribute types coherently. The second issue relates to the instance representation itself. The current choice of numeric and symbolic attribute types is insufficient for many domains, in which more complicated representations are required. For example, a domain may require varying numbers of features, or features with structural information. The framework proposed by this thesis is sufficiently general to permit virtually any type of instance representation-all that is required is that a set of basic transformations that operate on the instances be defined. To illustrate the framework’s applicability to different instance representations, several example similarity functions are developed. The third, and perhaps most important, issue concerns the ability to incorporate domain knowledge within similarity functions. Domain information plays an important part in choosing an instance representation. However, even given an adequate instance representation, domain information is often lost. For example, numeric features that are modulo (such as the time of day) can be perfectly represented as a numeric attribute, but simple linear similarity functions ignore the modulo nature of the attribute. Similarly, symbolic attributes may have inter-symbol relationships that should be captured in the similarity function. The design framework proposed by this thesis allows domain information to be captured in the similarity function, both in the transformation model and in the probability assigned to basic transformations. Empirical results indicate that such domain information improves classifier performance, particularly when training data is limited

Research Commons@Waikato

Data mining in bioinformatics using Weka

Author: Frank Eibe
Hall Mark A.
Holmes Geoffrey
Trigg Leonard E.
Witten Ian H.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2004
Field of study

The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it

CiteSeerX

Research Commons@Waikato

Jumble Java Byte Code to Measure the Effectiveness of Unit Tests

Author: Cleary John G.
Inglis Stuart J.
Irvine Sean A.
Pavlinic Tin
Trigg Leonard E.
Utting Mark
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Jumble is a byte code level mutation testing tool for Java which inter-operates with JUnit. It has been designed to operate in an industrial setting with large projects. Heuristics have been included to speed the checking of mutations, for example, noting which test fails for each mutation and running this first in subsequent mutation checks. Significant effort has been put into ensuring that it can test code which uses custom class loading and reflection. This requires careful attention to class path handling and coexistence with foreign class-loaders. Jumble is currently used on a continuous basis within an agile programming environment with approximately 370,000 lines of Java code under source control. This checks out project code every fifteen minutes and runs an incremental set of unit tests and mutation tests for modified classes. Jumble is being made available as open source

Crossref

Research Commons@Waikato

USC Research Bank - University of the Sunshine Coast

University of Queensland eSpace

Correction to: Cluster identification, selection, and description in Cluster randomized crossover trials: the PREP-IT trials

Author: Acerboni-Flores Francesc
Achor Timothy S.
Adams John D.
Al-Asiri Jamal
Alavedra-Massana Anna
Altman Kyle M.
Anglada-Torres Neus
Apostle Kelly L.
Baele Joseph R.
Bailey Daniel
Bartel Claire
Beckish Michael L.
Bedard Julia C.
Belaye Tigist
Berenguer Alexandre
Berhaneselase Eleni
Bhandari Mohit
Bhandari Mohit
Blasman Jenna
Bosse Michael
Boulton Christina
Boulton Christina
Boyce Robert
Boyer Dory S.
Bray Christopher C.
Breslin Mary A.
Brown Timothy R.
Brownrigg Maggie
Bueno-Ruiz Mercedes
Burnikel Alex
Camara Megan
Caparros-García Ariadna
Carballo Alejandro
Carlisle Robert M.
Choo Andrew M.
Churchill Christine
Clark Linda
Cole Austin A.
Colello Michael
Cross Andrew W.
Cruz Alejandro
Culbert Hunter
Cámara-Cabrera Jaume
Davies Jonah
Degani Yasmin
Dehghan Niloofar
Del Fabbro Gina
Della Rocca Gregory J.
Demyanovich Haley
Demyanovich Haley K.
Denkers Matthew
Devereaux P. J.
Dew Timothy
Ding Anthony
Dodds Shannon
Dodds Shannon
Domènech Mònica Salomó
Donohue Erin
Dykes Tayler
D’Alleyrand Jean-Claude
Eglseder Andrew
Fan Chen “Brenda”
Faucher Gregory K.
Fillat-Gomà Ferran
Fowler Justin
Fraifogl Joanne
Friedrich Jeff
Fuentes-López Ruben
Gajari Vamshi
Gallant Jodi
Garcia-Rodriguez Ramona
García Yaiza
Gardner Michael J.
Garg Kartik
Garibaldi Alisha
Garitty Michael J.
Gary Joshua L.
Gaski Greg
Gaski Greg E.
George Annie
Gimeno-Calavia Nuria
Girardi Cara
Gitajn I. Leah
Gjorgjievski Marko
Graells-Alonso Guillem
Grissom Frances
Guerra-Farfán Ernesto
Guerra-Farfán Ernesto
Gurich Richard W.
Guyatt Gordon H.
Hagen Jennifer E.
Hale Diamond
Halliday Paul G.
Harris Anthony D.
Harris Anthony D.
Hebden Joan
Hebden Joan
Heels-Ansdell Diane
Hill Lauren C.
Hofer Erin Adams
Holzman Michael A.
Horne Brandon
Howe Andrea
Hsu Joseph
Hudgins Andrea
Huggins Brandon S.
Hunter Michael D.
Hurkett Colin
Hymes Robert A.
Jahangir Amir Alex
Jeray Kyle J.
Jerele Jennifer
Johal Herman
Johnson Aaron
Jones Clifford B.
Jornet-Gibert Montsant
Joseph Katherine
Joshi Manjari G.
Kandemir Utku
Karim Ammar
Karunakar Madhav
Kempton Laurence
Kettering Eric
Kimmel Joseph
Knapp Thomas
Koenig Chris
Krupko Thomas
Langhammer Christopher
Lazarus David E.
Lebrun Christopher
Lemke H. Michael
Leonard Jordan
Li Silvia
Liang Stephen
Little Kelly
Loeffler Markus F.
Lowe Jason
Lowney Maya
Malekzadeh A. Stephen
Maltz Ethan J.
Manson Theodore
Marcano-Fernández Francesc
Marcano-Fernández Francesc
Marcano-Fernández Francesc A.
Marchand Lucas
Marmor Meir
Marshall William A.
Martin Abigail
Martinez Eric
Martí-Garín David
Martínez-Carreres Laia
Martínez-Grau Patricia
Martínez-Álvarez Marta
Marvel Debra
Matityahu Amir
Mayberry Robert Miles
Mayfield Ada
McClellan R. Trigg
McKay Paula
McKay Paula
McKee Michael
McKinley Todd O.
Medeiros Michelle
Mehta Samir
Meinberg Eric
Miclau Theodore
Miller Adam
Millon S. John
Mitchell Phillip
Moody M. Christian
Moola Farhad O.
Morshed Saam
Mossuto Franca
Mullins C. Daniel
Mullins Daniel C.
Mullis Brian H.
Munz John W.
Nascone Jason
Natoli Roman M.
Nguyen Uyen
Obremskey William T.
on behalf of the PREP-IT Investigators
Osborn Patrick
O’Hara Lyndsay M.
O’Hara Lyndsay M.
O’Hara Nathan N.
O’Toole Robert V.
Palmer Jana
Palmer M. Jason
Parker Wesley
Paryavi Ebrahim
Patrick Matthew
Paul Alexandra
Payne Kyrsten
Pechero Guillermo
Pelfort Xavier
Pellejero-García Raúl
Pensy Raymond
Perey Bertrand H.
Persico Federico
Petrisor Brad A.
Petrisor Bradley A.
Peñalver Juan Manuel
Pham Kayla H.
Phelps Kevin
Pichiotino Erin R.
Pierrie Sarah
Pogorzelski David
Pogorzelski David
Pollak Andrew
Poole Andrew S.
Porter Scott E.
Prayson Michael
Pritchett Charles
Rajaratnam Krishan
Ramsey Lolita
Ramsey T. Bennett
Rao Mayank
Ray Shea Bielby
Ristevski Bill
Rivera Jessica
Robertson Emily
Rodriguez Andres
Rodriguez Elsa
Rojas Alejandra
Romeo Nicholas M.
Routt Milton L. “ Chip”
Rudisill L. Edwin
Rudnicki Joshua
Ruth John T.
Ràfols-Perramon Ona
Sadasivan Kalia
Saeed Sabina
Sanders John L.
Sanz-Molero Matsuyama
Schaller Thomas M.
Schulman Jeff E.
Schwartz Justin
Schwartzbach Cary C.
Sciadini Marcus
Scott Taryn
Scott Taryn
Seach Andrea
Serrano-Sanz Jorge
Sethi Manish
Seymour Rachel
Shearer David
Sietsema Debra L.
Sims Michael L.
Sims Stephen
Slobogean Gerard P.
Slobogean Gerard P.
Smith Cory D.
Snider Rebecca G.
Soler-Cano Albert
Sorkin Anthony T.
Spicer Ella
Sprague Sheila
Sprague Sheila
Sridhar Michael S.
Stewart Russell
Stinner Daniel J.
Stone Trevor B.
Szasz Olivia Paige
Szatkowski Jan P.
Sánchez-Fernández Joel
Sánchez-Palomino Estrella
Talbot Max
Talerico Michael
Taljaard Monica
Tanner Stephanie L.
Taylor Michel
Thabane Lehana
Thabane Lehana
Toogood Paul
Townsend Christine E.
Trochez Karen
Vallier Heather A.
Van Chi
Velasco-Barrera Aldo
Venkatarayappa Indresh
Virkus Walter W.
Viskontas Darius G.
Vlasak Richard
Wadenpfuhl Leanne K.
Walker Clark M.
Warner Stephen J.
Watson J. Tracy
Wells Jeffrey
Wentworth Daniel
Wild Jason
Williams Alyse
Williams Dale
Wilson Eleanor S.
Wise Jeremy
Wood Amber
Wood Amber
Yela-Verdú Christian
Zomar Mauri
Zura Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/09/2020
Field of study

An amendment to this paper has been published and can be accessed via the original article

IUPUIScholarWorks

Experiences with a weighted decision tree learner

Author: Cleary John G.
Trigg Leonard E.
Publication venue: University of Waikato, Department of Computer Science
Publication date: 01/05/1998
Field of study

Machine learning algorithms for inferring decision trees typically choose a single “best” tree to describe the training data. Recent research has shown that classification performance can be significantly improved by voting predictions of multiple, independently produced decision trees. This paper describes an algorithm, OB1, that makes a weighted sum over many possible models. We describe one instance of OB1, that includes all possible decision trees as well as naïve Bayesian models. OB1 is compared with a number of other decision tree and instance based learning algorithms on some of the data sets from the UCI repository. Both an information gain and an accuracy measure are used for the comparison. On the information gain measure OB1 performs significantly better than all the other algorithms. On the accuracy measure it is significantly better than all the algorithms except naïve Bayes which performs comparably to OB1

Research Commons@Waikato

A diagnostic tool for tree based supervised classification learning algorithms

Author: Holmes Geoffrey
Trigg Leonard E.
Publication venue
Publication date: 01/03/1999
Field of study

The process of developing applications of machine learning and data mining that employ supervised classification algorithms includes the important step of knowledge verification. Interpretable output is presented to a user so that they can verify that the knowledge contained in the output makes sense for the given application. As the development of an application is an iterative process it is quite likely that a user would wish to compare models constructed at various times or stages. One crucial stage where comparison of models is important is when the accuracy of a model is being estimated, typically using some form of cross-validation. This stage is used to establish an estimate of how well a model will perform on unseen data. This is vital information to present to a user, but it is also important to show the degree of variation between models obtained from the entire dataset and models obtained during cross-validation. In this way it can be verified that the cross-validation models are at least structurally aligned with the model garnered from the entire dataset. This paper presents a diagnostic tool for the comparison of tree-based supervised classification models. The method is adapted from work on approximate tree matching and applied to decision trees. The tool is described together with experimental results on standard datasets

Research Commons@Waikato

Experiences with OB1, An Optimal Bayes Decision Tree Learner

Author: John G. Cleary
Leonard E. Trigg
Publication venue
Publication date
Field of study

In machine learning, algorithms for inferring decision trees typically choose a single "best" tree to describe the training data, although recent research has shown that classification performance can be significantly improved by voting predictions of multiple, independently produced decision trees. This paper describes a new algorithm, OB1, that weights the predictions of any scheme capable of inferring probability distributions. We described an implementation, OB1, that includes all decision trees, as well as naive Bayesian models. Results indicate that OB1 is a very strong robust learner and makes plausible the claim that it successfully subsumes other techniques such as boosting and bagging that attempt to combine many models into a single prediction. Keywords: Option trees, Bayesian Statistics, Decision trees. Email address of contact author: [email protected] Phone number of contact author: +64 7856 2889 1 Introduction The standard approach for inferring decision tre..

CiteSeerX

Naive Bayes for regression

Author: Frank Eibe
Holmes Geoffrey
Trigg Leonard E.
Witten Ian H.
Publication venue: University of Waikato, Department of Computer Science.
Publication date: 01/01/1998
Field of study

Despite its simplicity, the naïve Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates. This paper shows how to apply the naïve Bayes methodology to numeric prediction (i.e. regression) tasks, and compares it to linear regression, instance-based learning, and a method that produces “model trees” - decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naïve Bayes is the method of choice, on real-world datasets it is almost uniformly worse than model trees. The comparison with linear regression depends on the error measure: for one measure naïve Bayes performs similarly, for another it is worse. Compared to instance-based learning, it performs similarly with respect to both measures. These results indicate that the simplistic statistical assumption that naïve Bayes makes is indeed more restrictive for regression than for classification

CiteSeerX

Research Commons@Waikato

Osteonecrosis as a Complication of Treating Acute Lymphoblastic Leukemia in Children: A Report From the Children’s Cancer Group

Author: Harland N. Sather
James B. Nachman
Leonard A. Mattano
Michael E. Trigg
Publication venue: 'American Society of Clinical Oncology (ASCO)'
Publication date
Field of study

Crossref

Treatment Outcome and Prognostic Factors for Infants With Acute Lymphoblastic Leukemia Treated on Two Consecutive Trials of the Children's Cancer Group

Author: Beverly J. Lange
Emiko J. Holmes
Fatih M. Uckun
Gregory H. Reaman
Harland N. Sather
Helen S. Johnstone
James H. Feusner
Marcia Leonard
Martha G. Sensel
Michael E. Trigg
Nyla A. Heerema
Paul M. Zeltzer
Paul S. Gaynon
Peter G. Steinherz
Richard Sposto
Richard T. O'Brien
Thomas W. Pendergrass
Publication venue: 'American Society of Clinical Oncology (ASCO)'
Publication date
Field of study

Crossref